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Student evaluation of courses and teaching at universities remains a highly contentious and divisive 
topic. Emotions and anecdotal evidence can overrule conclusions drawn from research on the validity 
and design of course evaluations. However, even amongst researchers, there is significant disagreement 
on the efficacy of course and teaching evaluations. This paper explores this ongoing dialogue through the 
medium of a parliamentary debate drawingfrom the breadth of current research on course evaluations. 


Introduction 


S tudent evaluation of courses and teaching is a 
contentious issue in higher education. Recent¬ 
ly, Cote and Allahar (2007) went as far as to assert 
that professorial fear of student evaluations is a ma¬ 
jor contributing factor to rampant grade inflation 
across North America. Controversy centres on the 
perceived validity of student course/teaching eval¬ 
uations: are students capable of providing accurate 
assessments of teaching ability and course content? 
The answer to this question has practical 


implications. For faculty, student evaluations can 
influence promotion and tenure decisions. For stu¬ 
dents, evaluations may influence course selection and 
are often the only opportunity they have to provide 
feedback on the quality of instruction. Furthermore, 
these evaluations may be growing in importance in 
a public policy context increasingly concerned with 
the ‘quality’ of higher education. 

This paper, based on a session given at the 
2008 Society for Teaching and Learning in Higher 
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Education (STLHE) conference at the University of 
Windsor, provides an overview of research on student 
course/teaching evaluation validity, including infor¬ 
mation about instrument development, interpreta¬ 
tion and factors often understood to influence evalu¬ 
ation results. The session presentation, and this paper, 
are both drawn from a larger research project under¬ 
taken on behalf and with the support of the Higher 
Education Quality Council of Ontario (HEQCO). 1 

The Great Debate 

Since the assessment of teaching effectiveness is a 
contentious issue, it is not surprising that research 
in this area is equally divided. Consequently, we de¬ 
cided that our STLHE session would explore current 


research on this topic through the oppositional for¬ 
mat of a parliamentary debate. We debated the reso¬ 
lution that: student course evaluations are a valid and 
reliable measure of teaching effectiveness for the pur¬ 
poses of summative evaluation. We invited session 
participants to consider the arguments and evidence 
presented, offer their own thoughts and experiences 
through ‘speeches from the floor,’ and vote for the 
argument they felt was more compelling through ‘di¬ 
vision of the house.’ The modified format of our ses¬ 
sion may be found in Table 1. 

We have reproduced both the Prime Minister/ 
Government’s and Leader of the Opposition’s speeches 
below. We do not suggest that there is a clear ‘winner’ 
in this debate (although the result of the vote during 
our conference session was against the resolution), 
but do point out that there is significant evidence and 


Table i 

Format of the STLHE Session 


Government’s opening speech 

Introduce resolution to be debated, outline 
government’s argument and begin building its case. 

5min 

Opposition’s speech 

Response to resolution. Outline opposition’s 
argument, respond to government’s case and begin 
building opposition’s case. 

5min 

Speeches from the floor 

Opportunity for the honourable members of the 
assembled House to respond to the government 
and/or opposition’s cases and/or put questions to 
either side. 

lOmin 

Opposition’s closing remarks 

Response to speeches from the floor and summary 
of opposition’s case. 

5min 

Government’s closing remarks 

Response to speeches from the floor and summary 
of government’s case. 

5min 

Division of the House 

A simple call of ‘yeah’ or ‘nay’ will be used to 
measure the opinion of the House. 

5min 

Committee of the Whole 

The speaker/chair is removed to allow for more 
unstructured discussion - a conventional question 
and answer session. 

lOmin 


1 The complete research paper, Student Course Evaluations: Research, Models and Trends (Gravestock & Gregor-Greenleaf 2008), 
is available through HEQCO at http://www.heqco.ca/SiteCollectionDocuments/Student%20Course%20Evaluations.pdf 
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compelling argumentation on both sides of this issue. 

GOVERNMENT (opening remarks) 

Be it resolved that student course evaluations are a 
valid and reliable measure of teaching effectiveness 
for the purposes of summative evaluation. 

Mr. Speaker, this resolution must stand. 

There is general and long-standing agree¬ 
ment in the research that course evaluation instru¬ 
ments can be, and most often are, reliable tools for 
measuring instructional ability in that they provide 
consistent and stable measures for specific items (e.g., 
an instructor’s organizational skills or relative work¬ 
load). This is particularly true when the tool is care¬ 
fully constructed and psychometrically tested before 
use (for examples, see Abrami, 2001; Theall & Frank¬ 
lin, 2001; Wachtel, 1998; Goldschmid, 1978; Marsh 
& Roche, 1997; and McKeachie, 1997). 

Since the 1970s, scholars have been seeking 
to identify characteristics that bias student evalua¬ 
tion ratings - studies have focused on administrative 
conditions, course, instructor, and student charac¬ 
teristics. However, in 40 years of research, nothing 
has been identified that significantly impacts ratings. 
As Greenwald (1997) notes in his review of the re¬ 
search, the majority of publications produced be¬ 
tween 1975 and 1995 favoured validity. McKeachie 
(1997) argues that student course evaluations are the 
“single most valid source on teaching effectiveness” 
(p. 1218). Those who found course evaluations to be 
valid have shown that ratings data can be correlated 
to other evidence of teaching effectiveness such as 
evaluations from colleagues or trained faculty devel¬ 
opment personnel. 

Issues such as class time, discipline, instructor 
rank and experience, student motivation, course level, 
and instructor enthusiasm do have a small, but mea¬ 
surable impact on evaluation ratings. However, this 
impact does not reflect bias but rather indicates valid 
shifts in teaching effectiveness. Moreover, they can be 
considered when ratings are interpreted. 

The research does show that there is a positive 
correlation between grades and student ratings. Some 
instructors interpret this to mean that lenient grad¬ 
ing practices can produce inflated ratings. However, 
Wachtel (1998), Marsh and Dunkin (1992), Murray 


(1987), and others argue that this positive correlation 
is simply evidence of student learning: students rate 
faculty more positively when they have had a positive 
classroom experience. 

Anecdotal evidence also suggests that faculty 
who assign more course work are penalized by stu¬ 
dents with low ratings. However, a study by Heckert, 
Latier, Ringwald-Burton, and Drazen (2006) found 
that higher evaluations were given to courses in 
which the difficulty level was viewed as appropriate 
but were also positive when students indicated they 
had expended more effort than anticipated. Overall, 
this study concludes that more demanding instruc¬ 
tors received higher evaluations and therefore refutes 
the grading leniency hypothesis, and the notion that 
faculty could ‘buy’ better evaluations with higher 
grades. 

Several decades of research destroy these and 
countless other myths and misperceptions regarding 
the validity of student course evaluations. For exam¬ 
ple, many call into question the ability of students 
to accurately evaluate teaching effectiveness, arguing 
that they are not reliable assessors. Studies dating 
back to the 1970s consistently demonstrate this to be 
false and show that students are reliable and effective 
at evaluating teaching behaviours (e.g., presentation, 
clarity, organization, and active learning techniques), 
the amount they have learned, the ease or difficulty of 
their learning experience in the course, the workload 
in the course, and the validity and value of the as¬ 
sessment used in the course (Nasser & Fresko, 2002; 
Theall & Franklin, 2001; Ory & Ryan, 2001; Wa¬ 
chtel, 1998; Wagenaar, 1995). Scriven (1997) argues 
that students are “in a unique position to rate their 
own increased knowledge and comprehension as well 
as changed motivation toward the subject taught. As 
students, they are also in a good position to judge 
such matters as whether tests covered all the material 
of the course” (p. 2). 

Another persistent myth suggests that rat¬ 
ings reflect instructor popularity or personality. The 
now famous “Dr. Fox” study from the 1970s, which 
concludes that an instructor’s enthusiasm or person¬ 
ality can impact evaluations, is widely refuted and 
discounted on methodological grounds. Ory (2001) 
argues that “personality” may actually measure teach- 
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ing behaviours, such as enthusiasm, that may in fact 
influence teaching effectiveness. 

Mr. Speaker, let me mention one final myth, 
not supported by the research: the majority of fac¬ 
ulty object to the use of student course evaluations. 
Studies demonstrate that this is not the case; rather, 
a high percentage of faculty possess positive attitudes 
toward this tool. 

OPPOSITION (opening arguments and 
rebuttal) 

Mr. Speaker, let me clearly state that I concede all of 
the government’s points. I agree that course evalu¬ 
ation instruments offer reliable data and valid mea¬ 
surements of the questions on the forms. 

I do not, however, concede the resolution. 
Rather, I argue that the government has not pre¬ 
sented a sufficient perspective of validity. As the 
government has proven, evaluation forms can and 
have been developed that adequately pre-empt the 
influence of any external, biasing factors. However, 
this internal validity is meaningless if the forms are 
improperly constructed or used - if student ratings 
have insufficient construct and consequential valid¬ 
ity. I will argue that current course evaluation prac¬ 
tice does not provide these types of validity, and that, 
consequently, student course evaluations do not pro¬ 
vide a valid measure of teaching effectiveness for the 
purposes of summative evaluation. 

Ory and Ryan (2001) note that “to make val¬ 
id inferences about student ratings of instruction, the 
rating items must be relevant to and representative of 
the processes, strategies, and knowledge domain of 
teaching quality” (p. 32). For course evaluations to 
be valid measures of teaching effectiveness, not only 
must the questions reflect those aspects of teaching 
identified as effective, but the very definition of ef¬ 
fective teaching must be identified and agreed upon 
- but, as Ory and Ryan conclude, no “universal set of 
characteristics of effective teachers and courses that 
should be used as a target.. .appears to exist” (p. 32). 
Furthermore, educational priorities vary by institu¬ 
tion, discipline, and even course. By mandating a ge¬ 
neric, prescriptive evaluation instrument, we ensure 
that evaluations are unresponsive to desired and in¬ 
evitable variations in teaching styles and goals. 


We cannot, therefore, develop an instrument 
that accurately assesses teaching effectiveness because 
we cannot yet identify universal, comprehensive, and 
stable measures of effective teaching. 

Even if appropriate measures of teaching ef¬ 
fectiveness could be identified - though I have just 
shown this to be impossible — there remains another 
insurmountable obstacle to course evaluation valid¬ 
ity. This is the obstacle of the appropriate interpre¬ 
tation of course evaluation results by faculty and 
administrators. Menges (2000) argues that “a great 
many individuals in the assessment area would assert 
that no matter how valid and reliable the instrument 
is, consumers can and do misuse the results from it” 
(p. 8). According to Menges, this misuse, and con¬ 
sequent compromise to validity, can occur for two 
primary reasons: 

1. Administrators frequently receive too much 
or too little data to properly read the forms. 
Individual scores on large numbers of ques¬ 
tions present an overload of information; con¬ 
versely, evaluation data is rarely accompanied 
by information providing a thorough contex- 
tualization of the data, including descriptions 
of course activities and goals. 

2. Once they do receive the forms, users of course 
evaluation data are unclear about the statistical 
value of evaluation results, often overestimat¬ 
ing the significance of, for example, the dif¬ 
ference between a rating of 3.5 and one of 3.7 
on a 5-point scale. Administrators interpret¬ 
ing the data can not articulate a meaningful 
distinction between these two scores, and yet 
are pleased to report that the instructor with 
a score of 3.7 is a “better” instructor. These 
statistical challenges are amplified when such 
comparisons are made across diverse courses 
or disciplines. 

For these insurmountable obstacles to the validity of 
course evaluations introduced during the construc¬ 
tion of evaluation instruments and the interpretation 
of evaluation data, Mr. Speaker, I must reiterate my 
assertion that student course evaluations are not valid 
indicators of teaching effectiveness. 
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OPPOSITION (closing arguments) 

Mr. Speaker, let me once again reiterate that I agree 
with the Government that teaching evaluations are 
quite effective at measuring what they seek to mea¬ 
sure. I argue, however, that this is a minor, even 
meaningless determinant of their validity. Until we 
can agree on a universal set of effective teaching char¬ 
acteristics, or a universally effective way of organizing 
and presenting course content, we cannot develop 
evaluation instruments that can effectively capture 
the infinite varieties of effective teaching and risk, as 
McKeachie (1997) states, “penalizing] the teacher 
who is effective despite less than top scores on one or 
more of the dimensions” (p. 1218) of teaching mea¬ 
sured on evaluations. 

At the other end of the evaluation process are 
the threats to validity introduced in the interpretation 
of evaluation results by users who overestimate the 
precision of evaluation data and fail to properly con¬ 
textualize student ratings according to the particular 
circumstances, characteristics, and intentions of indi¬ 
vidual courses and instructors. For these reasons, Mr. 
Speaker, I must restate my strong belief that student 
course evaluations are not valid measures of teaching 
effectiveness for the purposes of summative evaluation. 

GOVERNMENT (closing arguments) 

Mr. Speaker, my esteemed colleague raises many in¬ 
teresting and relevant issues that institutions should 
bear in mind when developing course evaluation 
systems; however, let me recall that the most essen¬ 
tial issue here is that of bias. As numerous empirical 
studies have shown, this can be addressed through 
instrument design, question selection, administra¬ 
tion, implementation, and education about interpre¬ 
tation. As Abrami (2001), Franklin (2001), Theall 
and Franklin (1989, 2001), Kulik (2001) and others 
note, and we fully agree, education helps to ensure 
that when data is used for summative purposes, deci¬ 
sions are fair and equitable. 

The issues raised by my colleague do not 
point to any invalidity in the course evaluation in¬ 
strument itself but rather to issues affecting the role 
of teaching in the university more generally and par¬ 
ticularly for the evaluation of teaching for summative 


purposes, including tenure and promotion. Moore 
and Kuol (2005) argue: 

Given that it is an almost universal phe¬ 
nomenon that research activity reaps more 
individual rewards than those associated 
with teaching, efforts to measure the teach¬ 
ing related dimensions of [faculty] per¬ 
formance, and to pay attention to those 
measures in the context of an individual’s 
professional development helps to create 
more parity of esteem between the teaching 
and research components of the academic 
role. (p. 143) 

As such, course evaluations are an essential compo¬ 
nent to ensure the recognition of teaching in higher 
education. The quantifiability and comparability of 
course evaluations makes the imprecise art of evalu¬ 
ating teaching more objective and manageable. As 
Abrami (2001) argues, there is no other option that 
provides the same sort of quantifiable and compa¬ 
rable data. 

All of this only highlights the need for greater 
attention to this area and the best way to do this is 
through the continued use of course evaluations. 

Conclusion 

During the speeches from the floor, many points 
were raised both criticizing the use of student course 
evaluations and supporting evaluations’ proper use in 
an academic environment. Although seminar partici¬ 
pants’ comments were evenly split for and against the 
debate’s resolution, when participants were given the 
opportunity to vote, the opposition carried the day by 
a large majority. It is difficult to explain why there was 
such a clear winner in this debate. It was apparent that 
some participants were inherently distrustful of stu¬ 
dent evaluations of courses and teaching and that even 
researched evidence could not dissuade them from 
longheld beliefs in popular myths and misperceptions 
about course evaluations. It is also possible that oth¬ 
ers may have been swayed by the argument that more 
work is needed before any teaching assessment tool 
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can be declared ‘valid.’ The varied opinions expressed 
on this topic during our presentation suggest that the 
debate over student course evaluations is far from be¬ 
ing resolved. 
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